Intelligent video navigation techniques

ABSTRACT

Automatic replay or skip ahead functionality can be configured to intelligently navigate to a portion of a video a user desires to view. The context at which a user selects intelligent navigation can be analyzed to determine where to initiate automatic replay or skip ahead. The context for intelligent navigation can be based on scene or shot segmentation data, closed captioning, aggregate video navigation data from a community of users of shared demographic traits and/or interest, and/or other metadata. In the case of automatic replay, playback of a portion of a video can include enhancements for that portion, such as providing closed captioning, display at a decreased frame rate (“slow motion”), zooming in/out on a portion of the frames of a video segment, among other enhancements.

BACKGROUND

Applications such as video-on-demand, video-sharing, digital videobroadcasting, massive open online courses (MOOCs) or distance education,among other uses of digital video, are becomingly increasingly popular.An advantage of digital video over analog video is the relative ease inwhich users can navigate digital videos. For example, a conventionalapproach for navigating a digital video is the use of a “scrubber” thatenables a user to quickly “fast-forward” by moving the scrubber forwardand to quickly “rewind” by moving the scrubber backward. Anotherconventional approach for navigating a digital video is to provide a“skip ahead” button that fast-forwards a video by a specified number ofseconds and a “playback” or “replay” button that “rewinds” the video bya specified number of seconds. However, these techniques of using ascrubber or skipping forward or reversing backward by a specified numberof seconds may not accurately reflect where in the digital video theuser has intended to navigate.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments in accordance with the present disclosure will bedescribed with reference to the drawings, in which:

FIGS. 1A and 1B illustrate a conventional approach for providingnavigation for a digital video.

FIGS. 2A and 2B illustrate an example approach for providing intelligentnavigation for a digital video that can be used in accordance with anembodiment;

FIG. 3 illustrates an example approach for providing intelligentnavigation for a digital video that can be used in accordance with anembodiment;

FIG. 4 illustrates an example approach for enhancing a digital videothat can be used in accordance with an embodiment;

FIG. 5 illustrates an example approach for enhancing a digital videothat can be used in accordance with an embodiment;

FIG. 6 illustrates an example process for enabling intelligent videonavigation that can be used in accordance with an embodiment;

FIG. 7 illustrates an example process for enabling intelligent videonavigation that can be used in accordance with an embodiment;

FIG. 8 illustrates an example computing device that can be used inaccordance with various embodiments;

FIG. 9 illustrates an example configuration of components of a computingdevice such as that illustrated in FIG. 8; and

FIG. 10 illustrates an environment in which various embodiments can beimplemented.

DETAILED DESCRIPTION

Systems and methods in accordance with various embodiments of thepresent disclosure may overcome one or more of the aforementioned andother deficiencies experienced in conventional approaches for enabling auser to navigate a digital video. In some embodiments, a user can selectto replay a portion of a video, and based on the context of the video,the video will automatically be navigated to the portion the user mostlikely intended to replay. For example, the video may be replayed at thebeginning of a shot or a scene depending upon how much of the shot orthe scene has already been played. As another example, the video may bereplayed at the beginning of dialogue, such as the start of a monologue,narration, conversation between multiple characters, among otherpossibilities.

In various embodiments, a segment of a video that is replayed canautomatically be enhanced to improve video playback. For instance, incertain embodiments where a segment of video to be replayed includesdialogue, closed captioning can be presented during that segment. Insome embodiments where a section of video corresponds to an action sceneor action shot, the section can be played back in “slow motion” or at adecreased frame rate. In at least some such embodiments, sound and/ordialogue can automatically be adjusted to account for the slow motion ordecreased frame rate, such as by increasing the length of time inbetween gaps in sound and/or dialogue. In some embodiments, extrinsicdata, such as cast and character images and information, biographicalinformation, quotations, trivia, goofs, related offerings, sceneinformation, and other extrinsic data may be presented during replay. Insome embodiments, a replayed segment can automatically be zoomed in orzoomed out of a particular portion of one or more frames of the replayedsegment.

In some embodiments, the context of a section of video that is replayedcan be based on scene or shot segmentation metadata. That is, a videomay be segmented according to scenes and/or shots, and this data can beused to determine where in the video to begin replay. In otherembodiments, replay context may be dependent upon analysis of closedcaptioning associated with a video, speech-to-text translation ofdialogue, or similar text-based approaches. In some embodiments, videoreplay context may be based on aggregated data of users who havepreviously viewed the video. For example, a threshold number of previousviewers may have requested playback of a particular segment of thevideo, such as due to that segment including unintelligible audio or dueto that segment being especially compelling to the previous viewers. Asanother example, a threshold number of previous viewers may have turnedup the volume at a particular segment of the video indicating that thesegment incorporates inaudible audio. Such data can be collected, andwhen a current viewer requests to replay that particular segment,playback can be based on the aggregated data. In certain embodiments,some combination of these various approaches, among others, can beutilized to determine where to begin playback of a video.

In some embodiments, a user can select to skip a portion of a video, andthe video will automatically be forwarded to a subsequent section of thevideo using similar techniques discussed above. For example, the usercan automatically skip to the end of a scene or a shot or the conclusionof certain dialogue. In other embodiments, scenes or shots may beassociated with certain metadata that can be used to automaticallyadvance a video to a later segment. This metadata can includeinformation classifying a shot or a scene as corresponding to openingcredits or closing credits, information indicating that the shot orscene includes objectionable material (e.g., nudity, adult language,violence, etc.), information indicating an end point of dialogue,information indicating an end point of a song or score incorporatedwithin the shot or scene, among other possibilities. In still otherembodiments, a scene can be automatically advanced based on thecollective behavior of viewers who have previously viewed the video. Inyet still other embodiments, some combination of these approaches, amongothers, can be used to determine automatic video fast-forwarding.

Various other functions and advantages are described and suggested belowas may be provided in accordance with the various embodiments.

FIGS. 1A and 1B illustrate a conventional approach for enabling a userto navigate a digital video. In the example situation 100 of FIG. 1A, avideo of President Abraham Lincoln is shown on display screen 104 ofcomputing device 102. At a certain sequence, President Lincoln isdelivering the Gettysburg address, in particular, the words 112, “anddedicated to the proposition that all men are created equal,” which isoutput via speaker 106 of the computing device. The application,operating on computing device 102 and playing the video of PresidentLincoln presenting the Gettysburg address, includes a conventional userinterface element 108 that enables a user to navigate to a portion ofthe video 10 seconds back in time. User interface element 108 issometimes referred to as a playback, replay, or skip back button. Userinterface element 108 is provided, for example, in the event that a userdidn't understand or missed a portion of dialogue to enable the user toplayback or replay the missed dialogue. As another example, a particularvideo sequence may be especially compelling to a user and the playbackor reply button enables the user to re-watch the sequence.

In the example situation 120 of FIG. 1B, a user (not shown) has clickedon user interface element 108, causing a previous portion of the video10 seconds back in time to be displayed on display screen 104 ofcomputing device 102. Upon clicking the playback or replay button 108,speaker 106 outputs a portion of the Gettysburg address at the words122, “our fathers brought forth on this continent.” The conventionalapproach thus plays a previous portion of a video a static amount oftime (i.e., 10 seconds) back in time. This static approach may notnavigate to the portion of the video the user is actually interested inre-watching but instead, such as here, navigates to the middle ofdialogue (i.e., the middle of the first sentence of the Gettysburgaddress). A user is likely to be more interested in playing back asequence of the video at the beginning of a scene or shot, or at thebeginning of the dialogue, which is oftentimes not a static amount oftime back in time.

FIGS. 2A and 2B illustrate an example approach for enabling a user tonavigate a digital video that can be used in accordance with anembodiment. In the example situation 200 of FIG. 2A, a video ofPresident Abraham Lincoln is presented on display screen 204 ofcomputing device 202. Although a portable computing device (e.g., asmart phone, tablet computer, or portable digital media player) is shownthat can be held in a user's hands, it should be understood that othertypes of computing devices can utilize aspects of the variousembodiments as should be apparent in light of the teachings andsuggestions contained herein. These other types of computing devices caninclude, for example, desktop computers, notebook computers, videogaming consoles, televisions, television set top boxes, digital videodisc (DVD) players, digital media players, network appliances, amongothers. At a particular sequence in time, President Lincoln isdelivering the words 212, “and dedicated to the proposition that all menare created equal,” of the Gettysburg address. This portion of theGettysburg address is output to speaker 206 located on the front of thedevice and the on same surface as the display screen to output audio tosubject matter facing the front of the device, such as a user (notshown) viewing the display screen. It should be understood that, whilethe components of the example device are shown to be on a “front” of thedevice, there can be similar or alterative components on the “sides” or“back” of the device as well (or instead). Further, directions such as“front,” “side,” and “back” are used for purposes of explanation and arenot intended to require specific orientations unless otherwise stated.In some embodiments, a computing device may include more than onespeaker on the front of the device and/or one or more speakers on theback (and/or sides) of the device.

The example situation 200 of FIG. 2A is similar to the example situation100 of FIG. 1A. However, in the example 200 of FIG. 2A, the applicationoperating on computing device 202 and playing the video of PresidentLincoln, includes a user interface element 208 that provides for“intelligent” navigation to a previous portion of the video based on thecontext of the current portion video being played. In this example, an“i” icon is used to indicate that user interface element 208 providesfor intelligent playback or replay. In other embodiments, a dynamicamount of time can be displayed to indicate the previous portion of thevideo that will be played back upon selection of the intelligent replaybutton. For example, if the context of the current portion of the videoindicates that replay should occur 8.9 seconds back in time, “8.9 s” canbe displayed in lieu of the “i” icon. Further, it will be appreciatedthat the intelligent replay button 208 is not necessarily alwaysdisplayed during the presentation of a video. Instead, in variousembodiments, a user interaction during the playing of the video isrequired to cause the intelligent replay button 208 to be displayed andto be selectable (e.g., pressing a remote control button, moving avirtual cursor over the video, touching or having a finger hovering overa touchscreen displaying the video, etc.)

As mentioned, in some embodiments, the context for automatic videonavigation can be based on scene or shot segmentation data. Videosegmentation generally involves the partitioning of a video into itsconstituent parts, such as scenes, shots, and frames. A scene comprisesa series of consecutive shots grouped together because, for example,they are captured in the same location or they share thematic content. Ashot can be a sequence of frames recorded contiguously and representinga continuous action in time or space. A shot can also be an unbrokensequence of frames captured by a single camera. A frame is a singlestill image of a video. For example, a 90 minute film shot at 24 framesper second will contain 129,600 frames. Approaches for segmenting avideo are discussed in co-pending U.S. patent application Ser. No.14/577,277, filed Dec. 14, 2015, and entitled “Video SegmentationTechniques,” which is hereby incorporated herein by reference. Once avideo has been segmented according to scene or shot, the segmentationdata can be used for determining how far backward (or forward) tonavigate the video when a user selects to replay (or fast-forward) aparticular section of the video. For instance, in at least someembodiments, pressing a replay (or skip ahead) button can result in avideo being navigated to a beginning (or end) of a scene or a shot.

In other embodiments, the context for automatic video navigation can bebased on the dialogue corresponding to a section of video. For example,clicking on a replay (or skip ahead) button by a user can cause a videoto be navigated based on closed captioning cues. Closed captioning istypically embedded in a video (e.g., CEA-608, CEA-708, among others) orstored as a separate file (e.g., TTML, DFXP, SMPTE-TT, SCC, EBU-TT,EBU-STL (binary), WebVTT, among others). CEA-608, also sometimesreferred to as Line 21, is the National Television System Committee(NTSC) analog television standard used in the United States and Canada.In CEA-608, captions are encoded directly into a hidden area of thevideo stream by broadcasting devices. CEA-708 is the Advanced TelevisionSystems Committee (ATSC) digital television standard used in the UnitedStates and Canada. Timed Text Markup Language (TTML) is a markuplanguage that provides for the synchronization of text and other media,such as audio or video. Distribution Format Exchange Profile (DFXP) is aparticular implementation of TTML that defines when and how to displaycaption data. Society of Motion Picture and Television Engineers-TimedText (SMPTE-TT) is an extension of DFXP that adds support for threeextensions found in other captioning formats and informational items butnot found in DFXP, #data, #image, and #information. SMPTE-TT is also theFCC Safe Harbor format. If a content provider produces captions inSMPTE-TT format, the provider has satisfied its obligations to providecaptioning in an accessible format. Scenarist Closed Caption (SCC)format contains SMTPE timecodes with corresponding encoded caption dataas a representation of CEA-608 data. EBU-TT is a strict subset of TTML,and supported by the European Broadcast Union (EBU). That is, all EBU-TTfiles are valid TTML documents but not all TTML documents are validEBU-TT files. EBU-STL is a binary format used by the EBU. EBU-STL filesare stored as separate .STL files. Synchronized Accessible MediaInterchange (SAMI) is based on HTML. WebVTT is a proposed standard forHTML5 video closed captioning. In at least some embodiments, closedcaptioning data can be utilized for enabling automatic video navigation.For example, if a user selects to replay (or fast-forward) a firstsection of video, the closed captioning data may be analyzed todetermine a section of the video marking the beginning (or end) of amonologue, narration, or conversation from which automatic navigationcan be initiated.

In still other embodiments, aggregate data from a community of users canbe utilized for enabling automatic video navigation. In someembodiments, the community of users may share a demographic trait, suchas age, gender, geographic location, income bracket, among others, withthe specific user. In other embodiments, the community of users mayshare common interests, such as viewing, purchasing, recommending,and/or rating similar products and/or consuming common media items(e.g., video, music, books, video games, apps, etc.) In still otherembodiments, the community of users can be based on a combination ofshared demographic traits and interests. Once a community of users hasbeen identified, their interactions with videos can be monitored andutilized for automatic video navigation for a particular user. Forexample, if a threshold number of users request playback (or skip ahead)from a first section of video to a second earlier (or later) section ofvideo, such data can be a good indication of where to initiate automaticvideo navigation because it is more likely that the particular userwould also prefer to navigate from the first section to the earlier (orlater) second section.

In yet still other embodiments, other data associated with a digitalvideo can be utilized for automatic video navigation, such as audio data(e.g., background song, score, or other audio), data regarding actorsappearing in a scene, other text data (e.g., subtitles, location text,etc.) or other metadata tags or associations (e.g., action scene,opening credits, closing credits, etc.) In some embodiments, acombination of these approaches can be used for automatic videonavigation, such as using a weighted combination based on the context atwhich replay or skip ahead is selected; using a first approach based ona first context, using a second approach based on a second context,using a third approach based on a third context, etc.; using multipleapproaches at once and selecting the approach associated with a highestlevel of confidence; using multiple approaches at once and selecting adefault approach when no single approach meets a threshold level ofconfidence; using multiple approaches at once and selecting a mean,median, or mode; among other possibilities. Various approaches known tothose of ordinary skill in the art for combining data can be utilizedwithin the scope of the various embodiments.

FIG. 3 illustrates an example approach for providing navigation for adigital video that can be used in accordance with an embodiment. Inparticular, FIG. 3 depicts an approach for replaying a portion of adigital video utilizing the automatic navigation techniques discussedherein. In the example 300 of FIG. 3, a 90-minute digital video isrepresented as a timeline 302 in increments of seconds from 0 to 5400seconds. A user (not shown) views the video up to a point in time 304,which can represent a frame of the digital video (e.g., 1/24ths of asecond if the video is shot at 24 frames per second (fps)). At thatpoint in time or frame 304, the user desires to replay a portion of thevideo and requests playback by clicking an intelligent replay button. Inthis example, a replay window 306 is identified for determining where inthe video to initiate playback. In an embodiment, the replay windowcomprises a period of time between a minimum replay threshold (e.g., 8seconds or 192 frames) and a maximum replay threshold (e.g., 12 secondsor 288 frames). In various embodiments, the minimum replay threshold canbe as short as 3 seconds and the maximum replay threshold can be as longas 5 minutes. In some such embodiments, holding down a replay button orclicking on the replay button multiple times can cause replay to beinitiated from a point further back in time or a point based on coarsersegmentation data (e.g., DVD chapter segmentation data). In someembodiments, only one of a minimum replay threshold or a maximum replaythreshold can be utilized. In other embodiments, no thresholds areutilized and either all previous played portions of a video are analyzedto determine the point of playback or some other heuristic can beutilized for determining the point of playback (e.g., navigating back tothe beginning of a scene or a shot, beginning of dialogue, beginning ofa background song or score, etc.)

In this example, after the replay window has been identified, the replaywindow can be analyzed to determine a point of playback 308 from wherereplay of the video is initiated. As discussed elsewhere herein, thepoint of playback can be based on video segmentation data, closedcaptioning, cumulative data from a community of users, other metadata,or a combination thereof. For example, the point of playback 308 canrepresent the beginning of a scene or a shot corresponding to point orframe 304, the beginning of dialogue according to closed captioningdata, a playback point selected by a community of users, and/or thebeginning of a musical score, among other possibilities. In someembodiments, automatic playback can be accompanied with an enhancementbetween frames 308 and 304, such as the segment of video between pointsor frames 308 and 304 being played in slow-motion or at a decreasedframe rate, closed captioning being added to the segment, extrinsic databeing displayed during the segment, zooming in/zooming out to aparticular portion of the frames of the segment, among otherenhancements. Although example 300 of FIG. 3 illustrates automaticreplay, it will be appreciated by one of ordinary skill in the art thatsimilar techniques can be utilized for enabling automaticfast-forwarding.

FIG. 4 illustrates an example approach for enhancing a digital videothat can be used in accordance with an embodiment. In the example ofFIG. 4, a stationary camera (not shown) is utilized to capture video ofa moving car when the car is centered within the field of view of thecamera, as seen in frame 402 a, as the car moves outside the center ofthe field of view of the camera, as seen in frame 404, and as the carmoves outside the field of view of the camera, as seen in frame 406. Itwill be appreciated that there additional frames between frames 402 aand 404 and 404 and 406 but only frames 402 a, 404, and 406 are providedfor illustrative purposes. At a point in time corresponding to frame406, a user (not shown) elects to playback a portion of the video andperforms a user interaction to cause user interface element 410, anautomatic playback button, to be displayed and to be selectable. Forexample, the user may tap a touchscreen of a portable computing devicedisplaying the video or use a finger to hover over the touchscreen, usea mouse or other input element to move a virtual cursor over a videoapplication playing the video, or press a remote control button of atelevision set, digital media player, or other video display appliance,among other possibilities. In this example, the scene or shotcorresponding to frames 402 a, 404, and 406 has been annotated inmetadata as an action scene. Selection of the automatic replay button408 causes playback to be initiated from the beginning of the shot orscene, as seen in frame 402 b. In this example, the selection of theautomatic replay button also causes the video segment to be played inslow-motion or at a decreased frame rate because the scene ischaracterized as an action scene. Thus, instead of viewing the car as itmoves from the center of the field of view of the camera until iscompletely outside the field of view of the camera as seen in frames 402a, 404, and 406, the car can be seen from the center of the field ofview of the camera as it slowly goes outside, but not completelyoutside, the field of view of the camera, as seen in frames 402 b, 410,and 412.

In some embodiments, a user selection of intelligent playbackautomatically causes a portion of a video to be played in slow-motion orat a slower frame rate, such as when a scene or shot is characterized asan action scene or action shot. In other embodiments, users can manuallycause a portion of a video to be played in slow-motion or at a slowerframe rate, such as holding down the intelligent replay button ortapping the intelligent replay button multiple times. In someembodiments, a portion of a video can be replayed at different, slowerframe rates. For example, holding down the intelligent replay button for1 second will cause a portion of a video to be replayed at 2× the normalframe rate, holding the button down 2 seconds will cause the portion ofthe video to be replayed at 4× the normal rate, holding down the button3 seconds will cause the portion of the video to be replayed at 16× thenormal frame rate, etc. Further, holding down the intelligent replaybutton for an extended period of time can cause cycling of the differentframe rates and releasing the button will result in playback at the lastdisplayed frame rate. In other embodiments, double tapping can cause theportion of the video to be replayed at 2× the normal frame rate, tripletapping can cause the portion of the video to be replayed at 4× thenormal frame rate, etc.

In at least some embodiments, audio data can also be modified tocorrespond to a decreased frame rate using time compression/expansion ortime stretching. As known in the art, time stretching leaves the pitchof a signal intact while changing its speed or tempo. There are twoprimary time compression/expansion techniques—Phase Vocoder and PitchSynchronized Overlap-Add (PSOLA). Phase Vocoder uses a Short TimeFourier Transform (STFT) to convert the audio signal to the Fourierrepresentation. As the STFT returns the frequency domain representationof the audio signal at a fixed frequency grid, the actual frequencies ofthe partial bins can be found by converting the relative phase changebetween two STFT outputs to actual frequency changes. The timebase ofthe audio signal can be changed by calculating the frequency changes inthe Fourier domain on a different time basis, and then an inverse STFTis computed to regain the time domain representation of the signal.PSOLA is based on a correct estimate of the fundamental frequency of theprocessed audio signal. In one implementation, the Short Time AverageMagnitude Difference function is calculated to find the minimum value.The timebase is changed by copying the input to the output in anoverlap-and-add manner while simultaneously incrementing the inputpointer by the overlap-size minus a multiple of the fundamental period.This results in the input being traversed at a different speed than theoriginal data while aligning the estimated fundamental period.

FIG. 5 illustrates an example approach for enhancing a digital videothat can be used in accordance with an embodiment. In particular, theexample video segment 500 comprises a couple talking at a restaurantwith a first camera capturing a profile view of both the man and thewoman, as shown in frame 502 a; a second camera capturing a front viewof the man as he speaks, as depicted in frame 504; and a third cameracapturing a front view of the woman as she speaks, as illustrated inframe 506. Again, it should be understood that there can be additionalframes between frames 502 a and 504 and 504 and 506. At a particularpoint in time corresponding to frame 508, a user (not shown) performs aninteraction causing a user interface element 508, an intelligent replaybutton, to be displayed and to be selectable. The user subsequentlyclicks on the intelligent replay button, causing a portion of the videoto be replayed at the beginning of the frame 502 b. In this example,video segment or scene 500 includes dialogue. Therefore, upon replay,the scene 500 is augmented with closed captioning 510.

In some embodiments, a user selection of intelligent playbackautomatically causes a portion of a video to be incorporated with closedcaptioning, such as when a scene or shot includes dialog. In otherembodiments, users can manually cause a portion of a video to be playedwith closed captioning, such as holding down the intelligent replaybutton or tapping the intelligent replay button multiple times. In someembodiments, holding down the intelligent replay button or multipleclicks of the intelligent replay button can enable different modalitiesto be selected by a user. For example, in an embodiment, holding downthe replay button can enable a user to select to review a scene in slowmotion, review the scene with closed captioning, review the scene withextrinsic data, review a zoomed in/zoomed out perspective of the scene,among other possible enhancements.

Examples of the extrinsic data that can be presented may include namesor descriptions of performers in a video, biographies or filmographiesof the performers, commentary, trivia, mistakes, user comments, imagedata, and/or other data. The extrinsic data may include curated datathat is professionally managed, verified, or is otherwise trustworthy,and/or non-editorially curated sources (e.g., “Wiki” sources). Forexample, the extrinsic data may include cast/crew data, quote/triviadata, soundtrack data, product data, and/or other data. The cast/crewdata can include the name, biographical data, character information,images, and/or other data describing cast members who perform in a videoor crew members who are involved in the production of the video. Thebiographical data may include various information such as stage name,birth name, date of birth, date of death, an editorially curatedbiography, and/or other information.

The quote/trivia data may include various quotations from characters,trivia items, goofs, and other interesting tidbits of information forthe video and may be correlated with times of appearance in in the videoand/or scenes of appearance in the video. The soundtrack data mayinclude various information about the audio of the video. For example,the soundtrack data may identify that a particular audio track is beingused at a certain time in the video or during a certain scene of thevideo. The soundtrack data may indicate whether the audio corresponds toa title or theme track. In addition, the soundtrack data may identifyperformers who vocally perform characters in the audio. Such performersmay be considered cast members. However, such performers may differ fromcast members who visually perform the same characters in some cases. Onesuch case is where, for example, when a song is recorded by a vocalistand a different performer merely lip-syncs to the recorded song in thevideo.

The product data may identify associations of products with times orscenes in a video. The products may correspond to any item offered forpurchase, download, rental, or other form of consumption. For example, aparticular brand of potato chips may be shown and/or mentioned indialogue of a movie. The product data may be used to promote productsthat are related to various scenes in the video at the appropriatetimes. Such promotions may be rendered relative to a position of theproduct within a frame of the video. Such products may also includebooks, electronic books, soundtrack albums, etc. that are related to thevideo. For example, the video may be an adaptation of a book, or thealbum might be for the soundtrack of the video.

The image data may correspond to images of a performer which are takenwhen the performer is not performing a particular character. Forexample, such an image might be taken at an awards ceremony, at a pressconference, at an informal setting, and/or elsewhere. Such an image maybe a headshot or other image. Multiple generic images may be providedfor a particular performer. For example, a performer may have a lengthycareer, and the performer's image data may be included for various timeswithin the career.

Although the example of FIG. 5 illustrates automatic replay, it will beunderstood by one of ordinary skill that similar techniques can beimplemented for automatic fast-forwarding.

FIG. 6 illustrates an example process 600 for enabling intelligent videonavigation that can be used in accordance with an embodiment. It shouldbe understood that, for any process discussed herein, there can beadditional, fewer, or alternative steps performed in similar oralternative orders, or in parallel, within the scope of the variousembodiments unless otherwise stated. The process may begin by poweringon a computing device and playing a video. The process 600 may continueby obtaining a request to replay a portion of the video as the video isbeing played 602. For example, a user may cause an intelligent replaybutton to be displayed and to be selectable, such as by tapping on atouchscreen presenting the video or using a finger to hover over thetouchscreen; clicking on the video as it is being played using an inputelement, such as a mouse, trackpad, or pointer stick; or pressing aremote control button as the video is being played. After theintelligent replay button is displayed, the user can select theintelligent replay button, such as by tapping or clicking on the button.A current frame of the video is then obtained 604. Each previous frameof a plurality of previous frames is analyzed based on videosegmentation data 606 to determine an identified frame from which toinitiate playback. In some embodiments, the plurality of previous framesto be analyzed can be determined by setting a minimum replay thresholdand a maximum replay threshold such that only those frames within thisreplay window are analyzed. In an embodiment, the minimum replaythreshold can be set to 8 seconds and the maximum replay threshold canbe set to 12 seconds. In other embodiments, the minimum replay thresholdcan be set as low as 3 seconds and the maximum replay threshold can beset as high as 5 minutes. In some such embodiments, holding down thereplay button or clicking on the replay button multiple times may resultin replay from a point further back in time or a point based on coarsersegmentation data, such as DVD chapter data. In still other embodiments,only one of a minimum replay threshold or a maximum replay threshold canbe set. For instance, in one embodiment, when a current framecorresponds to an action scene, a maximum threshold of fifteen minutescan be set and all of the frames between the current frame and the21,600 previous frames can be analyzed to determine a beginning of sceneor a shot corresponding to the current frame.

At a decision point 608, it is determined whether one of the previousframes marks the beginning of a scene or a shot. If a previous framedoes not mark the beginning of a scene or a shot, a next frame isanalyzed. If a previous frame is determined to mark the start of a sceneor a shot, that identified frame is selected as the playback point, andplayback can be initiated from that frame 610. During playback, thesegment between the identified frame and the current frame can beenhanced 612. For example, if the scene includes dialogue, playback canbe enhanced with closed captioning. As another example, if the scenecorresponds to an action scene, the scene can be played back in slowmotion or at a decreased frame rate. In at least some embodiments, theaudio data, including speech, can be time-stretched and aligned with thecorresponding video data as discussed elsewhere herein. In someembodiments, the enhancement may be the display of extrinsic data, suchas cast and character images and information, biographical information,quotations, trivia, goofs, related offerings, scene information, andother extrinsic data.

Although the example process 600 is directed towards intelligent replay,it will be appreciated by one of ordinary skill in the art that similartechniques can be utilized for intelligent fast-forwarding. For example,instead of analyzing a set of previous frames for determining thebeginning of a scene or a shot, intelligent fast-forwarding can beimplemented by analyzing a set of successive frames for determining theend of a scene or a shot. In at least some embodiments, intelligentfast-forwarding can also include time-compression of audio data andalignment with corresponding video data as discussed elsewhere herein.

FIG. 7 illustrates an example process 700 for enabling intelligent videonavigation that can be used in accordance with an embodiment. Theprocess may begin by playing a video on a computing device. The process700 may continue by receiving a request to playback a portion of thevideo while the video is being played 702. A current frame, at the pointthat the playback request is made, is obtained 704. Subsequently, eachprevious frame of a previous set of frames is analyzed based on closedcaptioning data 706 to determine a playback point. In particular, theprevious set of frames is analyzed to determine whether a previous framemarks the beginning of dialogue 708. As in the example process 600 ofFIG. 6, the previous set of frames analyzed can be based on a minimumreplay threshold and/or a maximum replay threshold to reduce the amountof processing required of process 700. If a previous frame is determinednot to mark the beginning of dialogue, a next frame is analyzed. On theother hand, if a previous frame is determined to mark the beginning ofdialogue, that identified frame is selected as the point of playback.The video is then replayed from the identified frame 710. Duringplayback, the portion of video between the identified frame and thecurrent frame can be enhanced 712, such as incorporating closedcaptioning, playing the segment at a decreased frame rate, or displayingextrinsic data, as discussed elsewhere herein. Although the exampleprocess 700 is directed towards intelligent replay, it will beunderstood by one of ordinary skill that a similar approach can beutilized for intelligent fast-forwarding. For instance, instead ofanalyzing a set of previous frames for determining the beginning ofdialogue, intelligent fast-forwarding can be implemented by analyzing aset of successive frames for determining the end of dialogue.

As mentioned, some embodiments enable different functionality based onuser interaction with the intelligent replay or fast-forward button. Forexample, holding down the intelligent replay button can bring up a menuenabling a user to select closed captioning enhancement or slow motionenhancement. As another example, tapping once on the intelligent replaybutton can cause a segment of video to be replayed at 2× the normalframe rate, tapping twice will cause the segment to be replayed at 4×the normal frame rate, tapping three times will cause the segment to bereplayed at 8× the normal frame rate, etc.

The processes 600 of FIG. 6 and 700 of FIG. 7 are directed towardsstreaming video, and the intelligent replay functionality is provided aspart of a streaming service. Other embodiments are directed towardsdigital video stored on a computing device, and the intelligent replayfunctionality is provided as part of a digital video playingapplication. In some embodiments, the digital video playing applicationmay connect to a remote server to obtain data to provide intelligentvideo navigation functionality, such as video segmentation data oraggregate video navigation data from a community of users. In otherembodiments, video segmentation data, aggregate video navigation data,and the like can be encoded in a stored digital video or stored asseparate file but associated with the digital video, and the digitalvideo playing application does not require network connectivity.

FIG. 8 illustrates an example computing device 800 that can be used toperform approaches described in accordance with various embodiments. Inthis example, the device includes four cameras 808 located at the topand bottom on each of a same and opposite side of the device as adisplay element 806, and enabling the device to capture images inaccordance with various embodiments. The computing device also includesan inertial measurement unit (IMU) 812, comprising a three-axisgyroscope, three-axis accelerometer, and magnetometer that can be usedto detect the motion and/or orientation of the device.

FIG. 9 illustrates a logical arrangement of a set of general componentsof an example computing device 800. In this example, the device includesa processor 902 for executing instructions that can be stored in amemory component 904. As would be apparent to one of ordinary skill inthe art, the memory component can include many types of memory, datastorage, or non-transitory computer-readable storage media, such as afirst data storage for program instructions for execution by theprocessor 902, a separate storage for images or data, a removable memoryfor sharing information with other devices, etc. The device typicallywill include some type of display element 906, such as a touchscreen,electronic ink (e-ink), organic light emitting diode (OLED), liquidcrystal display (LCD), etc., although devices such as portable mediaplayers might convey information via other means, such as through audiospeakers. In at least some embodiments, the display screen provides fortouch or swipe-based input using, for example, capacitive or resistivetouch technology. The device in many embodiments will include one ormore cameras or image sensors 908 for capturing image or video content.A camera can include, or be based at least in part upon any appropriatetechnology, such as a CCD or CMOS image sensor having a sufficientresolution, focal range, viewable area, to capture an image of the userwhen the user is operating the device. An image sensor can include acamera or infrared sensor that is able to image projected images orother objects in the vicinity of the device. Methods for capturingimages or video using a camera with a computing device are well known inthe art and will not be discussed herein in detail. It should beunderstood that image capture can be performed using a single image,multiple images, periodic imaging, continuous image capturing, imagestreaming, etc.

The device, in many embodiments, will include at least one audio element910, such as one or more audio speakers and/or microphones. Themicrophones may be used to facilitate voice-enabled functions, such asvoice recognition, digital recording, etc. The audio speakers mayperform audio output. In some embodiments, the audio speaker(s) mayreside separately from the device. The device, as described aboverelating to many embodiments, may also include at least one or moremotion and/or orientation elements 912 that provide information such asa position, direction, motion, or orientation of the device. These oneor more motion and/or orientation determining elements 912 can include,for example, accelerometers, inertial sensors, electronic gyroscopes,electronic compasses, and GPS elements.

The computing device also includes various power components 914 known inthe art for providing power to a computing device, which can includecapacitive charging elements for use with a power pad or similar device.The computing device can include one or more communication elements ornetworking sub-systems 916, such as a Wi-Fi, Bluetooth, RF, wired, orwireless communication system. The device in many embodiments cancommunicate with a network, such as the Internet, and may be able tocommunicate with other such devices. In some embodiments the device caninclude at least one additional input element 918 able to receiveconventional input from a user. This conventional input can include, forexample, a push button, touch pad, touchscreen, wheel, joystick,keyboard, mouse, keypad, or any other such component or element wherebya user can input a command to the device. In some embodiments, however,such a device might not include any buttons at all, and might becontrolled only through a combination of visual and audio commands, suchthat a user can control the device without having to be in contact withthe device.

As discussed, different approaches can be implemented in variousenvironments in accordance with the described embodiments. For example,FIG. 10 illustrates an example of an environment 1000 for implementingaspects in accordance with various embodiments. As will be appreciated,although a Web-based environment is used for purposes of explanation,different environments may be used, as appropriate, to implement variousembodiments. The system includes an electronic client device 1002, whichcan include any appropriate device operable to send and receiverequests, messages or information over an appropriate network 1004 andconvey information back to a user of the device. Examples of such clientdevices include personal computers, cell phones, handheld messagingdevices, laptop computers, set-top boxes, personal data assistants,electronic book readers and the like. The network can include anyappropriate network, including an intranet, the Internet, a cellularnetwork, a local area network or any other such network or combinationthereof. Components used for such a system can depend at least in partupon the type of network and/or environment selected. Protocols andcomponents for communicating via such a network are well known and willnot be discussed herein in detail. Communication over the network can beenabled via wired or wireless connections and combinations thereof. Inthis example, the network includes the Internet, as the environmentincludes a Web server 1006 for receiving requests and serving content inresponse thereto, although for other networks, an alternative deviceserving a similar purpose could be used, as would be apparent to one ofordinary skill in the art.

The illustrative environment includes at least one application server1008 and a data store 1010. It should be understood that there can beseveral application servers, layers or other elements, processes orcomponents, which may be chained or otherwise configured, which caninteract to perform tasks such as obtaining data from an appropriatedata store. As used herein, the term “data store” refers to any deviceor combination of devices capable of storing, accessing and retrievingdata, which may include any combination and number of data servers,databases, data storage devices and data storage media, in any standard,distributed or clustered environment. The application server 1008 caninclude any appropriate hardware and software for integrating with thedata store 1010 as needed to execute aspects of one or more applicationsfor the client device and handling a majority of the data access andbusiness logic for an application. The application server providesaccess control services in cooperation with the data store and is ableto generate content such as text, graphics, audio and/or video to betransferred to the user, which may be served to the user by the Webserver 1006 in the form of HTML, XML or another appropriate structuredlanguage in this example. The handling of all requests and responses, aswell as the delivery of content between the client device 1002 and theapplication server 1008, can be handled by the Web server 1006. Itshould be understood that the Web and application servers are notrequired and are merely example components, as structured code discussedherein can be executed on any appropriate device or host machine asdiscussed elsewhere herein.

The data store 1010 can include several separate data tables, databasesor other data storage mechanisms and media for storing data relating toa particular aspect. For example, the data store illustrated includesmechanisms for storing content (e.g., production data) 1012 and userinformation 1016, which can be used to serve content for the productionside. The data store is also shown to include a mechanism for storinglog or session data 1014. It should be understood that there can be manyother aspects that may need to be stored in the data store, such as pageimage information and access rights information, which can be stored inany of the above listed mechanisms as appropriate or in additionalmechanisms in the data store 1010. The data store 1010 is operable,through logic associated therewith, to receive instructions from theapplication server 1008 and obtain, update or otherwise process data inresponse thereto. In one example, a user might submit a search requestfor a certain type of item. In this case, the data store might accessthe user information to verify the identity of the user and can accessthe catalog detail information to obtain information about items of thattype. The information can then be returned to the user, such as in aresults listing on a Web page that the user is able to view via abrowser on the user device 1002. Information for a particular item ofinterest can be viewed in a dedicated page or window of the browser.

Each server typically will include an operating system that providesexecutable program instructions for the general administration andoperation of that server and typically will include computer-readablemedium storing instructions that, when executed by a processor of theserver, allow the server to perform its intended functions. Suitableimplementations for the operating system and general functionality ofthe servers are known or commercially available and are readilyimplemented by persons having ordinary skill in the art, particularly inlight of the disclosure herein.

The environment in one embodiment is a distributed computing environmentutilizing several computer systems and components that areinterconnected via communication links, using one or more computernetworks or direct connections. However, it will be appreciated by thoseof ordinary skill in the art that such a system could operate equallywell in a system having fewer or a greater number of components than areillustrated in FIG. 10. Thus, the depiction of the system 1000 in FIG.10 should be taken as being illustrative in nature and not limiting tothe scope of the disclosure.

The various embodiments can be further implemented in a wide variety ofoperating environments, which in some cases can include one or more usercomputers or computing devices which can be used to operate any of anumber of applications. User or client devices can include any of anumber of general purpose personal computers, such as desktop or laptopcomputers running a standard operating system, as well as cellular,wireless and handheld devices running mobile software and capable ofsupporting a number of networking and messaging protocols. Such a systemcan also include a number of workstations running any of a variety ofcommercially-available operating systems and other known applicationsfor purposes such as development and database management. These devicescan also include other electronic devices, such as dummy terminals,thin-clients, gaming systems and other devices capable of communicatingvia a network.

Most embodiments utilize at least one network that would be familiar tothose skilled in the art for supporting communications using any of avariety of commercially-available protocols, such as TCP/IP, OSI, FTP,UPnP, NFS, CIFS and AppleTalk. The network can be, for example, a localarea network, a wide-area network, a virtual private network, theInternet, an intranet, an extranet, a public switched telephone network,an infrared network, a wireless network and any combination thereof.

In embodiments utilizing a Web server, the Web server can run any of avariety of server or mid-tier applications, including HTTP servers, FTPservers, CGI servers, data servers, Java servers and businessapplication servers. The server(s) may also be capable of executingprograms or scripts in response requests from user devices, such as byexecuting one or more Web applications that may be implemented as one ormore scripts or programs written in any programming language, such asJava®, C, C# or C++ or any scripting language, such as Perl, Python orTCL, as well as combinations thereof. The server(s) may also includedatabase servers, including without limitation those commerciallyavailable from Oracle®, Microsoft®, Sybase® and IBM®.

The environment can include a variety of data stores and other memoryand storage media as discussed above. These can reside in a variety oflocations, such as on a storage medium local to (and/or resident in) oneor more of the computers or remote from any or all of the computersacross the network. In a particular set of embodiments, the informationmay reside in a storage-area network (SAN) familiar to those skilled inthe art. Similarly, any necessary files for performing the functionsattributed to the computers, servers or other network devices may bestored locally and/or remotely, as appropriate. Where a system includescomputerized devices, each such device can include hardware elementsthat may be electrically coupled via a bus, the elements including, forexample, at least one central processing unit (CPU), at least one inputdevice (e.g., a mouse, keyboard, controller, touch-sensitive displayelement or keypad) and at least one output device (e.g., a displaydevice, printer or speaker). Such a system may also include one or morestorage devices, such as disk drives, optical storage devices andsolid-state storage devices such as random access memory (RAM) orread-only memory (ROM), as well as removable media devices, memorycards, flash cards, etc.

Such devices can also include a computer-readable storage media reader,a communications device (e.g., a modem, a network card (wireless orwired), an infrared communication device) and working memory asdescribed above. The computer-readable storage media reader can beconnected with, or configured to receive, a computer-readable storagemedium representing remote, local, fixed and/or removable storagedevices as well as storage media for temporarily and/or more permanentlycontaining, storing, transmitting and retrieving computer-readableinformation. The system and various devices also typically will includea number of software applications, modules, services or other elementslocated within at least one working memory device, including anoperating system and application programs such as a client applicationor Web browser. It should be appreciated that alternate embodiments mayhave numerous variations from that described above. For example,customized hardware might also be used and/or particular elements mightbe implemented in hardware, software (including portable software, suchas applets) or both. Further, connection to other computing devices suchas network input/output devices may be employed.

Storage media and computer readable media for containing code, orportions of code, can include any appropriate media known or used in theart, including storage media and communication media, such as but notlimited to volatile and non-volatile, removable and non-removable mediaimplemented in any method or technology for storage and/or transmissionof information such as computer readable instructions, data structures,program modules or other data, including RAM, ROM, EEPROM, flash memoryor other memory technology, CD-ROM, digital versatile disk (DVD) orother optical storage, magnetic cassettes, magnetic tape, magnetic diskstorage or other magnetic storage devices or any other medium which canbe used to store the desired information and which can be accessed by asystem device. Based on the disclosure and teachings provided herein, aperson of ordinary skill in the art will appreciate other ways and/ormethods to implement the various embodiments.

The specification and drawings are, accordingly, to be regarded in anillustrative rather than a restrictive sense. It will, however, beevident that various modifications and changes may be made thereuntowithout departing from the broader spirit and scope of the invention asset forth in the claims.

What is claimed is:
 1. A system, comprising: a processor; and memoryincluding instructions that, upon being executed by the processor, causethe system to: obtain a request to replay a portion of a digital videofile as the digital video file is being presented; obtain a currentframe of the digital video file; analyze a plurality of previous framesof the digital video file between a minimum point in time and a maximumpoint in time relative to the current frame to determine that anidentified frame of the plurality of previous frames corresponding tothe plurality of previous frames and the current frame corresponds to abeginning of dialogue associated with the current frame; cause thedigital video file to be presented from the identified frame; andprovide at least one digital video enhancement to one or more framesbetween the identified frame and the current frame.
 2. The system ofclaim 1, wherein the at least one digital video enhancement includesclosed captioning, and the instructions upon being executed furthercause the computing device to: cause the closed captioning to bepresented between the identified frame and the current frame; anddisable the closed captioning after the current frame.
 3. The system ofclaim 2, wherein the at least one enhancement includes decreasing afirst frame rate of the digital video file, and the instructions uponbeing executed to cause the digital video file to be presented from theidentified frame include causing the system to: cause the one or moreframes between the identified frame and the current frame to bepresented at a decreased frame rate; and cause the current frame and oneor more successive frames of the digital video file to be presented atthe first frame rate.
 4. The system of claim 1, wherein the instructionsupon being executed further cause the computing device to: obtain asecond request to forward to a second portion of the digital video file;obtain a second current frame of the digital video file; analyze aplurality of successive frames of the digital video file relative to thesecond current frame to determine that a second identified frame of theplurality of successive frames corresponds to an end of the dialogueassociated with the current frame; and cause the digital video file tobe presented from the second identified frame.
 5. A computer-implementedmethod, comprising: obtaining a request to replay a portion of a video;obtaining a current frame of the video; determining an identified frameof a plurality of previous frames of the video from which to initiatereplay based at least in part upon closed captioning; and causing thevideo to be displayed from the identified frame.
 6. Thecomputer-implemented method of claim 5, wherein the closed captioningdata indicates a beginning of dialogue associated with the currentframe.
 7. The computer-implemented method of claim 5, wherein theplurality of previous frames comprises frames between a minimum point intime and a maximum point in time relative to the current frame.
 8. Thecomputer-implemented method of claim 5, further comprising: causingclosed captioning to be displayed between the identified frame and thecurrent frame; and disabling the closed captioning after the currentframe.
 9. The computer-implemented method of claim 5, wherein the videocorresponds to a first frame rate, and causing the video to be playedfrom the identified frame includes: causing one or more frames betweenthe identified frame and the current frame to be displayed at adecreased frame rate; and causing the current frame and one or moresuccessive frames to be displayed at the first frame rate.
 10. Thecomputer-implemented method of claim 9, further comprising: generatingtime-stretched audio data corresponding to the one or more framesbetween the identified frame and the current frame.
 11. Thecomputer-implemented method of claim 5, wherein the current framecorresponds to a first zoom level, and causing the video to be playedfrom the identified frame includes: causing one or more frames betweenthe identified frame and the current frame to be displayed at a modifiedzoom level; and causing the current frame and one or more successiveframes of the video to be displayed at the first zoom level.
 12. Thecomputer-implemented method of claim 5, wherein determining theidentified frame is further based at least in part upon aggregate videonavigation data from a community of users.
 13. The computer-implementedmethod of claim 5, wherein determining the identified frame is furtherbased at least in part upon video segmentation data.
 14. Thecomputer-implemented method of claim 5, wherein determining theidentified frame is further based at least in part upon one of a musicalscore, a song, or other audio data corresponding to the current frame.15. The computer-implemented method of claim 5, further comprising:obtaining a second request to forward to a second portion of the video;obtain a second current frame of the video; analyze a plurality ofsuccessive frames of the video relative to the second current frame todetermine that a second identified frame of the plurality of successiveframes corresponds to an end of dialogue associated with the currentframe; and cause the video to be presented from the second identifiedframe.
 16. A non-transitory computer-readable storage medium storinginstructions that, upon being executed by a processor of a computingdevice, cause the computing device to: obtain a request to replay aportion of a video; obtain a current frame of the video; determine anidentified frame of a plurality of previous frames of the video fromwhich to initiate replay based at least in part upon closed captioningcorresponding to the plurality of previous frames and the current frame;and cause the video to be displayed from the identified frame.
 17. Thenon-transitory computer-readable storage medium of claim 16, wherein thevideo is a streamed video.
 18. The non-transitory computer-readablestorage medium of claim 16, wherein the video is stored innon-transitory memory of the computing device.
 19. The non-transitorycomputer-readable storage medium of claim 16, wherein the instructionsupon being executed further cause the computing device to: cause closedcaptioning to be displayed between the identified frame and the currentframe; and disable the closed captioning after the current frame. 20.The non-transitory computer-readable storage medium of claim 16, whereinthe video corresponds to a first frame rate, and the instructions thatcause the video to be played from the identified frame includes causingthe computing device to: cause one or more frames between the identifiedframe and the current frame to be displayed at a decreased frame rate;and cause the current frame and one or more successive frames to bedisplayed at the first frame rate.